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Motivation 


►  U.S.  Army  currently  wages 
asymmetric  battles  against 
insurgencies 


►  Enemy  is  hard  to  detect 

°  Knowledge  of  local  terrain 
°  Ability  to  mix  in  with  the 
civilian  population 


►  Enemy  quickly  adapts  to 
Army  tactics  and  strategies 
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Motivation  (cont) 


►  The  needs  of  Soldiers 
change  in  response  to 
new  insurgent  strategies 

►  Real-time  adaptive  team 
responses  to  insurgent 
threats  are  key  to  mitigate 
the  risk  in  uncertain  and 
dynamic  battle  spaces 
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Research  Objective 

►  Goal:  Develop  ways  for  teams  to  learn  optimal 
game  strategies,  even  under  changing 
mission  requirements  and  team  objectives 

►  Problem:  Centralized  formulation  of  multi¬ 
agent  games  is  complex  and  needs  global 
data.  Can  we  decentralize  the  dynamics  in 
multi-agent  games  and  still  achieve  optimal 
performance? 


Outline 

►  Background  Information 

°  Game  Theory  for  Multi-Agent  Systems  (MAS) 

°  Graph  Theory  for  Communication  Graphs 
°  Synchronization  Control  Design  Problem 

►  Cooperative  Optimal  Control 

°  Local  Performance  Functions  for  Team  Behaviors 
°  Distributed  Hamilton-Jacobi  (HJ)  Equation 

►  Multi-Agent  Game  Distributed  Solution 

°  Reinforcement  Learning  Solution 
°  Online  Solution  using  Neural  Networks 
°  Simulation  Results 


Background  Information 


Game  Theory  for  MAS 

►  MAS  comprised  of  autonomous  agents  that 
cooperate  to  meet  a  system-level  objective 

►  Game  Theory  used  to  model  the  strategic 
behavior  of  MAS 

°  Outcomes  depend  not  only  an  agent’s  own  actions,  but 
also  the  actions  of  every  other  agent 

°  Each  agent  chooses  a  strategy  that  independently 
optimizes  his  own  performance  objectives  without  the 
knowledge  of  other  agent  strategies 

►  Team  decisions  normally  solved  offline 

°  Coupled  Riccati  equations  for  linear  systems 
°  Coupled  Hamilton-Jacobi  equations  non-linear  systems 


Graphs  for  Communications 


►  Consider  a  graph  Gr=(V,E) 
with: 

•  Nonempty  set  of  N agents 

V  =  {vx,...,vN} 

•  Set  of  edges  E  z  VxV 

•  Connectivity  matrix  E  =  [e,j] 

•  Set  of  neighbors 

•  In  degree  matrix  is  denoted  as 

D  =  [di]  =  ['£eiJ] 

j ^ 

►  Define  the  graph  Laplacian: 


L  =  D-E 


►  If  the  graph  is  strongly 
connected:  no  permutation 
matrix  such  that: 


* 


L  =  U 


0 


u 


T 


*  * 


Synchronization  Problem 

►  Consider  N agents  on  Gr  with  dynamics 

Xi  =  Axt  +  Blul ,  x.  (t)eUn,  iij  (0  e  □  m‘ ,  A(t)  e  □  nxn ,  B(t)  e  □  m‘xn 

►  Target  node  is  x0(0  eU" ,  which  satisfies  the 
dynamics:  x0  =  Ax0 


►  Synchronization  Problem:  design  local  control 
protocols  for  all  agents  in  Gr  to  synch  to 
target  node.  xt{t)  ->x0(t),  Vi 


Synchronization  Problem  (cont) 


►  Cooperative  team  objectives  can  be  described 
in  terms  of  the  local  neighborhood  tracking 
error  (LNTE) 


5i  =  X  % {Xl  ~  Xi  }  +  8i  (Xi  ~  X°  ^ 

JeN, 

►  Dynamics  of  the  LNTE 

4  =  X  e*/' (i'  “  (*,-  -  *o ) 


j^N, 


4  =  +  (di  +  giWVi  -  X  eijBJUJ 


Cooperative  Optimal  Control 

Multi-Agent  Games  on  Graphs 


Local  Cost  Function  for  Teams 


►  Goal:  To  achieve  synchronization  while 
optimizing  some  performance  measures  on 
the  agents 


Local  Cost  Function 


00 


J ,(5,(0),  1^,11^)  =  f  (Sj Q„S,  +  u,  R„u,  +  V  u^RjjUj)  dt 

0  j*N, 


Qn  >  0,  Rit  >  0,  Rij  >  0 


Local  Value  and  Hamiltonian 

►  Let  us  interpret  the  control  input  as 
policies/strategies 


Local  Value  Function 


00 


v,  (A  (0.  S-,  (0)  =  f  (<?  QiA  +  uj  RiiU,  +  £  «  J  )  dt 

t  jeN, 


►  Local  Hamiltonian  Function 


dV- 

Hi{8i,ui,u_i)  =  ' 


A 
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^  +  (</,  +  ^  eyBjUj 


J 


+sj Qlj8i  +uj RjfUj  +  2  =  0 

jcN, 


Local  Nash  Equilibrium 


►  The  control  objective  of  agent  /'is  to  find  the 
optimal  strategy  and  smallest  value: 


00 


K (4 (0. S-i (0)  =  min  f  (Sf  0,4  +  ufRsU,  +  Y  « J ^ )  dt 

U:  ^ 

'  t  JzN, 


►  Nash  equilibrium  solution  for  a  finite  /V-agent 
distributed  game  is  an  /V-tuple  of  strategies 
where: 


J"i  D  Ji  (MnM-i)^  Ji  (Mi  i  6  N 


Distributed  HJ  Equation 


►  Using  the  stationarity  condition  dHi  /  dui  =0  to 
find  the  optimal  control: 


Vi.  =-±(dI+gj)RlI'Bi 


2  ’  ‘  "  ‘  dS:  1  *  dS: 

L  L 

►  Substitute  into  Hamiltonian  to  get  distributed 


Hamilton-Jacobi  (HJ)  equation 
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Distributed  HJ  Equation  (cont) 

►  There  is  one  coupled  HJ  equation  corresponding 
to  each  agent. 

►  Therefore,  a  solution  to  this  multi-agent  game 
problem  requires  a  solution  to  N coupled  partial 
differential  equations. 

►  Next,  we  show  how  to  solve  this  online  in  a 
distributed  way 

°  Each  agent  requires  only  information  from  neighbors 
°  Use  techniques  from  reinforcement  learning 


Distributed  Solution  of 


the  Multi-Agent  Game 


Using  Reinforcement  Learning 


Reinforcement  Learning  (RL) 

►  RL  is  concerned  with  how  to  methodically 
modify  the  actions  of  an  agent  based  on 
observed  responses  from  its  environment. 

►  In  game  theory,  RL  is  considered  a  bounded 
rational  interpretation  of  how  equilibrium 
may  arise. 

►  One  technique  that  has  been  developed  from 
RL  research  in  controls  is  Policy  Iteration  (PI) 


Policy  Iteration  (PI) 

►  A  class  of  two-step  iteration  algorithms: 

policy  evaluation  and  policy  improvement 

°  Evaluation:  Apply  a  control.  Evaluate  the  benefit  of 
that  control. 

°  Improvement:  Improve  the  control  policy. 

►  In  control  theory,  PI  algorithms  amount  to: 

°  Learning  the  solution  to  a  non-linear  Lyapunov 
equation 

°  Updating  the  policy  by  minimizing  a  Hamiltonian 
function 


19 


Offline  PI  Algorithm 

►  To  solve  the  multi-agent  game  in  a 
distributed  way,  the  value  functions  must  be 
parameterized. 

►  However,  in  our  case,  it  is  not  clear  what 
parametric  form  the  value  should  take  in  the 
Hamiltonian. 

►  The  value  function  needs  to  be  in  terms  of 
local  variables  in  order  to  use  a  local  solution 
procedure 


Offline  PI  Algorithm  (cont) 

►  Step  0:  Start  with  stabilizing  initial  policies 

U  |  N(x) 


►  Step  1 :  Given  the  /V-tuple  of  policies,  solve 
for  the  costs  v\,vk2...,vk 


N 


0  = 


QiA  +ui  Ruui  +  ^  u]RAli  + 


dS , 


f 

+  (dj  +  gf  )Biui  -  ^ 


euBJuj 


V 


Vki(0)  =  0  ieN 


Offline  PI  Algorithm  (cont) 


►  Step  2:  Update  the  N-tuple  control  policies  by 
trying  to  minimize  the  Hamiltonian: 


«  +g,W,'BlT 


ieN 


►  Step  3:  Increment  k and  repeat  to  Step  1  until 
convergence 


Online  Solution  using  Neural  Nets 

►  Online  solution  uses  an  Actor-Critic  method 

°  Actor:  selects  the  policy  of  the  agent 
°  Critic:  criticizes  the  policy  of  the  actor 

►  The  output  of  the  Critic  drives  the  learning 
for  both  the  Actor  and  Critic 

►  In  this  solution,  Actors  and  Critics  are  neural 
networks  (NNs) 

°  Approximate  value  functions  and  their  gradients 
°  Use  proper  approximator  structures 


Value  Function  Approximator  (VFA) 


►  Assumption:  For  each  admissible  policy,  the 
non-linear  Lyapunov  equations  have  smooth 
solutions 


►  Critic  NN 


/v  -  /v  m 


►  Actor  NN 


Ui+N 


=  -±(di+gi)R;'Bi‘V</>iIWi 
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Online  Cooperative  Games 


►  Update  Critic:  learn  the  value 


Wt  =  -at  — 


ai+N 


(^i+N^i+N  +  1) 


+$!QiA +i^Nrm,« 


j^N, 


►  Update  Actor:  learn  the  control  policy 
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Some  Remarks  for  Online  Solution 


►  We  have  provided  the  base  for  tuning  the  actor/critic 
network  of  N  agents  at  the  same  time,  meaning  that 
teams  can  learn  online  in  real  time. 

►  Persistence  of  excitation  is  need  for  the  proper 
identification  of  the  value  functions  by  the  Critic  NN 

►  Nonstandard  tuning  algorithms  are  required  to 
guarantee  stability  for  the  Actor  NN 

►  NN  usage  suggest  starting  with  random,  non-zero 
control  weights 


Simulation 


►  Node  2  can  receive 
orders  from  Node  1 

►  Node  2  does  not  have  a 
transmitter  strong 
enough  to  acknowledge 
the  order  directly. 

►  Thus  Node  2  must  use  a 
router  (Node  3),  which 
under  a  security 
protocol,  cannot 
acknowledge  Node  2 
directly. 
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Simulation  Results 


►  Node  Dynamics 
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►  Select  Qa , R-n , R-ij  as  identity  matrices.  Results: 
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Summary 

►  Posed  the  Synchronization  Control  Problem 

►  Derived  the  distributed  Hamilton-Jacobi 
equation  in  terms  of  local  value  functions 

►  Proposed  distributed  solutions  to  the  Multi- 
Agent  Game 

°  Offline  Policy  Iteration  Algorithm 
°  Online  Solution  using  Actor/Critic  NNs 
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Future  Work 


►  Develop  more  simulations  using  more  agents 
in  time-varying  graphs 

►  Extend  the  results  of  this  research  to  graphs 
with  a  spanning  tree  (i.e.  not  necessarily 
strongly  connected) 

►  Incorporate  concepts  of  trust  into  cooperative 
multi-agent  systems 
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Questions? 

Kyriakos  G.  Vamvoudakis 

kvriakos@arri.uta.edu 


Dariusz  G.  Mikulski 

dqmikuls@oakland.edu.  dariusz.mikulski@us.armv.mil 
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